Entry Name: giCentre-Wood-MC2

VAST Challenge 2017
Mini challenge 2

Team members:

Jo Wood, giCentre, City University London, j.d.wood@city.ac.uk    PRIMARY

Student Team: NO

Tools Used:

Bespoke software designed and built using Processing and the giCentre Utils library written by the giCentre at City, University of London.

Approximately how many hours were spent working on this submission in total?

Approximately 10 hours to construct software and perform analysis. A further 10 hours assembling the report and video.

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2017 is complete?

YES

Video:


Questions:


MC2.1Characterize the sensors' performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture?

The nine sensors, each measuring four chemical concentrations, are generally functionally continuously during the three month-long sample periods. Readings are logged at hourly intervals 24 hours per day. Figure 1 shows the sensor readings (9 sensors; 4 chemical detectors each; 3 month-long periods) with exceptions where there are unexpected gaps in the logged readings symbolised as red discs with red vertical lines to aid accurate time comparison with other features. In this figure and the others below, the vertical scale measures the square root of chemical concentration in parts per million so can show variation among the less extreme values in addtion to spikes in chemical concentration.

Figure 1
Figure 1 The 9 sensors' readings over the three month-long periods showing missing readings in red.(click image for full size)

There are two broad patterns to the apparent absences represented by red discs. The first of these shows certain timestamps where missing data points coincide. Figure 2 shows there are 7 points in time where there is largely an absence of recordings - midnight at the start of 2nd April, 6th April, 2nd August, 4th August, 7th August, 2nd December and 7th December. The only readings at these times (circled in Figure 2) were on the 1st August for Sensor 3 (AGOC-3A and Methylosmolene); 7th December for Sensor 6 (AGOC-3A), Sensor 7 (AGOC-3A and Appluimonia) and Sensor 8 (AGOC-3A and Methylosmolene).

Figure 2
Figure 2 Missing midnight readings (in red) over the three month-long periods.(click image for full size)

Several of these exceptions to the missing data are revealing because for some sensors they occur at or around comparatively rare spikes in chemical concentrations. Of note is the peak in Methylosmolene in Sensor 3 that surrounds the reading for the 2nd August missing data timestamp (Figure 3). This and the following observations suggest there may be some association between the points of consistently missing data and possible unusual chemical readings.

Figure 3
Figure 3 Zoomed in Sensor 3 chart showing exception to 2nd August data gap coincides with Methylosmolene spike.(click image for full size)

Midnight on the 7th December invites particular scrutiny as this occurs around the time major peaks in Appluimonia (sensor 6, Figure 4), Methylosmolene and AGOC-3A (sensors 7 and 8, Figures 5 and 6).

Figure 4
Figure 4 Zoomed in Sensor 6 chart showing exception to 7th December data gap coincides with spike in AGOC-3A and Appluimonia.(click image for full size)
Figure 5
Figure 5 Zoomed in Sensor 7 chart showing the AGOC-3A exception to 7th December data gap coincides with spike in Methylosmolene.(click image for full size)
Figure 6
Figure 6 Zoomed in Sensor 8 chart showing the Methylosmolene and AGOC-3A exceptions to 7th December data gap coincides with large spike in Methylosmolene and moderate spike in AGOC-3A.(click image for full size)

Missing data at these points may be masking other peaks and require further investigation.

The second pattern observable from Figure 1 is a set of apparently missing readings in all sensors (but especially sensors 4, 5 and 6) in Methylosmolene. In every case other than those noted above, these missing vales coincide with double readings attributed to AGOC-3A in the same sensor. This is shown in Figure 7 where duplicate readings are symbolised as green discs with vertical lines accurately depicting the exact timestamps where they occur. In every case these are aligned with (i.e. occur at the same time as) missing Methylosmolene values (red discs on bottom row of each sensor).

Figure 7
Figure 7 Zoomed in portion of sensors 5 and 6 readings showing alignment of duplicate (gree discs) and missing values (red discs) and the (possible) correlation with spikes of AGOC-3A concentrations.(click image for full size)

Assuming that these duplicate/missing readings are the correct values but have been misallocated to the wrong chemical type, the following procedure was applied: For each sensor, the mean and standard deviation was calculated over the three month period for each chemical type. Labelling each pair of AGOC-3A duplicates as D1 and D2, there are two possible allocations: either

D1 -> AGOC-3A and D2 -> Methylosmolene
or
D1 -> Methylosmolene and D2 -> AGOC-3A

The z-scores (number of standard deviations away from the mean) for both possible allocations were calculated and option with the lowest sum of squared z-scores was automatically selected. In other words, each value was allocated to the distribution that it more typically represented.

Figure 8 shows some examples of this allocation. It can be seen that spikes occur in both AGOC-3A and Methylosmolene for these allocated values so there remains the possibility that the values themselves are incorrect, not simple allocated to the wrong group.

Figure 8
Figure 8 Zoomed in portion of sensors 5 and 6 readings showing transfer of duplicate readings from ACOC-3A to Methylosmolene (original duplicates shown as solid green disks, transferred values green circles). Spikes remain at these points in both AGOC-3A and Methylosmolene. (click image for full size)

There are a number of related possible explanations for the patterns seen.

  1. Some error in the sensor readings for AGOC-3 and Methylosmolene results in data being attributed to the wrong chemical type.
  2. High concentrations of one or more chemicals could be the cause of the error.
  3. The error above could result in erroneously high readings.
  4. There could be some deliberate malicious attempt to hide high readings.

Finally, there is a likely problem with Sensor 4 that shows a consistent increase in chemical concentration readings over time.

Figure 9
Figure 9 Drift in Sensor 4 readings revealed in CUSUM chart. Here the cumulative deviation from expected readings (based on the first week of April) is shown. All 4 chemical readings show an apparent gradual increase in the underlying concentration in addtion a day-to-day noisy variation and irregular spikes. (click image for full size)

There is a small possibility that this trend could be triggered by genuine local environmental change, but given this trend is not detected by other nearby sensors, this is regarded as low probability.

MC2.2Which chemicals are being detected by the sensor group? What patterns of chemical releases do you see, as being reported in the data?

Figure 10 provides an overview of the trends in chemical detection for all sensors. The CUSUM (Cumulative Sum) chart shows the cumulative z-score over time, which allows for trends to be detected more easily than the equivalent raw sensor readings (shown in Figure 1). 'Normal' behaviour was modelled on the mean and variance for week 1 of all sensors and so if beyond that week, chemical levels are consistently above or below normal behaviour, the CUSUM line moves above or below the baseline. This shows that in general levels of all chemicals were higher by December than they were in April (bars thicker and above the baseline towards the right of Figure 10).

Figure 10
Figure 10 CUSUM charts for all sensors/chemicals over time. (click image for full size)

Discounting the apparent increase in probably erroneous Sensor 4, some of the largest increases in concentration are for Methylosmolene (sensors 2, 5, 7, 8 and 9); Chlorodinine (sensors 1,5 and 9) and Appluimonia (sensor 5). Sensor 5 shows a general increase in three of the four chemicals, but the fact that AGOC3-A does not appear to increase suggests it does not have the same recording problem exhibited by Sensor 4. However given the spatial proximity of sensors 4 and 5, both sensors should be checked in order to rule in or out, the possibility of serious local contamination.

Figures 11-14 show in more detail the concentration levels of all four chemicals as reported by the set of 9 sensors for the three month periods.

Figure 11
Figure 11 Square root of AGOC-3A levels over time. Taking the square root of ppm concentrations reduces height of extreme peaks revealing detail in lower concentration values. All 9 sensors scaled equally. (click image for full size)
Figure 12
Figure 12 Square root of Appluimonia levels over time. All 9 sensors scaled equally. (click image for full size)
Figure 13
Figure 13 Square root of Chlorodinine levels over time. All 9 sensors scaled equally. (click image for full size)
Figure 14
Figure 14 Square root of Methylosmolene levels over time. All 9 sensors scaled equally. (click image for full size)

All four chemicals show a typical, largely random, noise component with occasional 'spikes' of much higher concentrations, typically 5-20 standard deviations from background levels. The most extreme spikes occur for AGOC-3A and Methylosmolene (noise showing smaller variation when scaled by maximum peak value in Figures 11 and 14). Appluimonia shows the least spikey distribution (Figure 12).

Figures 11-14 also show the anomalous behaviour of Sensor 4 for all chemicals suggesting at least a large part of the trend in apparently increasing concentrations is erroneous. The fact that Sensor 5 does not show a similar pattern lends further support to the observation above that levels of Appluimonia, Chlorodinine and Methylosmolene in that area are increasing over time rather than the product of sensor malfunction.

In addition to Sensor 5, Sensor 9 saw an increase in levels for all chemicals from the end of August (23-29th) and though December. Sensors 5 and 9 are geographically proximal and are the sensors that are closest to the interior of the park. The increase isn't immediately obvious from the raw concentration charts (Figures 11-14), but revealed by the CUSUM charts (Sensor 9 shown in Figure 15). Up until 23rd of August, detected levels are reasonably stable, but beyond that period we observe a trend of increasing concentrations due to combination of increased spike frequency and general background levels. This is most strongly evident in Chlorodinine but present also in the other three chemicals.

Figure 15
Figure 15 CUSUM charts for Sensor 9 showing increase in all chemical concentrations from end of August and through December (click image for full size)

MC2.3Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.

Spatial analysis of the sensor readings suggests the factories primarily responsible for high concentration chemical releases are as follows:

Kasios Office Furniture: AGOC-3AMethylosmolene
Indigo Sol Boards: Appluimonia 
Roadrunner Fitness Electronics: Chlorodinine 
Radiance ColourTek: No detectable environmental pollution


Spatial analysis and visualization was performed largely with a zoomable map view showing the positions of the sensors and factories along with a timeline view showing a summary of high concentration detection events (see Figure 16). The map view was constructed as part of an integrated spatial analysis and visualization used for all three VAST Mini-challenges.

Figure 16
Figure 16 Map and timeline view of the sensors and factory locations showing chemical detection events by sensor (vertically ordered) and chemical (colour hue). (click image for full size)

To detect the spread of windborne pollutants, selected peaks in concentration were displayed on the map view as probability cones based on the measured wind direction and strength at the time of the event. The probability cone shows the most likely source of the detected chemical by tracing a vector back in the opposite direction of the wind, expanding the likely region with distance away from sensor. The threshold that defines a detection event visualized in this way can be changed interactively. An example of events detected at 2pm on August 21st is shown in Figure 17.

Figure 17
Figure 17 Example of two detection events shown on 21st August, 14:00 by Sensors 4 and 9 in map and timeline views. Cones show probable source direction based on measured wind direction and speed at time of event. Wind rose on left shows summary of prevailing wind (yellow segments) and direction and speed of wind at selected event time (white arrow). (click image for full size)

By combining probability cones for all chemical concentration peaks, a composite picture is created (Figure 18) showing a spatial structure to the events when considered by chemical type. This composite suggests Indigo Sol Boards the likely source for (orange) Appluimonia, Roadrunner for (green) Chlorodinine and possibly Kasios for (pink) Methylosmolene and (blue) AGOC-3A. However, more convincing evidence is provided by filtering by chemical type.

Figure 18
Figure 18 Composite of all chemical detection events of at least 5.5 standard deviations from background levels. Wind probability cones show likely origin of chemicals.(click image for full size)

Examining just extreme AGOC-3A detection events (Figure 19), we see the most likely origins are in the region of Roadrunner Fitness and Kasios Office factories. However, Sensor 6 suggests Roadrunner an unlikely source given that very few detection events occurred with the prevailing NW wind (which would have carried the chemical from Roadrunner if it had been the source). In contrast, westerly winds almost exclusively carry AGOC-3A detected by the sensor. With Kasios being almost due west of sensor 6 this is the most likely origin.

Figure 19
Figure 19 AGOC-3A detection events of at least 6.6 standard deviations from background levels.(click image for full size)

Similar reasoning can be applied to Methylosmolene, originating from the same source (Figure 20).

Figure 20
Figure 20 Methylosmolene detection events of at least 5 standard deviations from background levels.(click image for full size)

Once wind direction is taken into account, the distribution of Appluimonia events can be seen as spatially distinct and uniquely focussed around Indigo Sol Boards (Figure 21). Note also that while Sensor 9 provides the primary positive evidence, Sensor 5 also supports this. As noted above, both of these sensors showed an increase in detection levels over time, suggesting the emissions from Sol Boards has increased since late August.

Figure 21
Figure 21 Appluimonia detection events of at least 4.9 standard deviations from background levels with single event on the 21st August 16:00 highlighted, marking the start of a period of increased emissions.(click image for full size)

Finally, evidence for the origin of the Chlorodinine emissions is provided in Figure 22. Sensor 6 is again particularly discriminating in ruling out Kasios as a source and instead providing compelling evidence for Roadrunner being located NW of the sensor.

Figure 22
Figure 22 Chlorodinine detection events of at least 6 standard deviations from background levels with single event on the 27th August 2am highlighted.(click image for full size)